Drone-Based AI Infrared Human Detection System for Disaster Response: A Proof-of-Concept Simulation: Software Testing¶

1. Install Libraries:¶

In [1]:
!pip install opencv-python
!pip install keras
!pip install opencv-python-headless
Requirement already satisfied: opencv-python in c:\users\suned\anaconda3\lib\site-packages (4.8.0.76)
Requirement already satisfied: numpy>=1.21.2 in c:\users\suned\anaconda3\lib\site-packages (from opencv-python) (1.23.5)
Requirement already satisfied: keras in c:\users\suned\anaconda3\lib\site-packages (2.12.0)
Requirement already satisfied: opencv-python-headless in c:\users\suned\anaconda3\lib\site-packages (4.8.0.76)
Requirement already satisfied: numpy>=1.21.2 in c:\users\suned\anaconda3\lib\site-packages (from opencv-python-headless) (1.23.5)

2. Import Libraries:¶

In [2]:
import numpy as np
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import cv2
import os
import json
from IPython.display import display, clear_output, HTML
from tkinter import filedialog
from tkinter import Tk
import unittest
import subprocess
from unittest.mock import patch
import pytest

3. Note the Following Testing Assumptions:¶

This project serves as a software-based simulation designed to emulate real-world technology applications. Consider the sample thermal map and thermal video drone footage provided below as a representative example of a geographical area over which a drone would be directed to conduct a flight in an actual operational context, as captured through an infrared imaging system:

In [3]:
# Displaying the image using Matplotlib:
img_path = './Map smaller size.png'
img = mpimg.imread(img_path)
plt.imshow(img)
plt.axis('off')
plt.show()

# Displaying the video using HTML5 <video> tag:
video_path = 'test video.mp4'
video_html = f'<video width="640" height="360" controls><source src="test video.mp4" type="video/mp4">Your browser does not support the video tag.</video>'
display(HTML(video_html))
Your browser does not support the video tag.

Assumption 1: In a practical scenario, the drone is presumed to maintain a consistent altitude of 15 meters above the ground level.

Assumption 2: Given that this project involves a simulation and lacks access to an actual infrared camera, it is posited that the camera is assumed to have a horizontal field of view of 100 degrees, and a vertical field of view of 90 degrees. The footage will be taken at an aspect ratio of 4:3, since it it generally the most common for general-use thermal cameras.

Assumption 3: In a practical scenario, the drone will be programmed to consistently navigate over a rectangular region with an area of 12500 square meters. This corresponds to dimensions of 125 meters in length (vertical) and 100 meters in width (horizontal). The navigation will be facilitated through the utilization of established flight planning software, such as DJI GO or DJI Fly. It is important to note that the integration and operation of this software falls outside the scope of this project, as the focus here is solely on simulation.

Assumption 4: The drone is presumed to consistently follow the up and down flight path depicted in the subsequent image:

In [4]:
img_path = './flight path.png'

img = mpimg.imread(img_path)
plt.imshow(img)
plt.axis('off')
plt.show()

Assumption 5: The operational flight area, referred to as the "map," will be subdivided into smaller, distinct sections, forming a grid. This grid is consistently structured to comprise four rows and three columns, as depicted in the accompanying illustration.

The drone is programmed to initiate its flight path from Block 1. Utilizing its thermal/infrared camera, the drone will commence recording as it navigates through the predetermined path: starting from Block 1, proceeding to Block 2, then to Block 3, and continuing sequentially until it reaches Block 12. At each block, a single frame from the infrared video footage will be captured for further analysis. The frame will be extracted once the drone is in the middle of each block, as depicted by the dots in the image above.

Subsequent to the capture of each frame, the corresponding block will be meticulously scanned for the presence of human individuals. In the event that a human is detected within a block, the pixel coordinates of this individual will be identified and reported. Notably, these coordinates are provided in relation to the entire map, rather than the isolated, cropped section of the current block. In a real-world application, these pixel coordinates are intended to correspond with actual GPS coordinates.

Furthermore, each block within the grid is designed with specific dimensions: A horizontal length of 33,33 meters, a vertical length of 31,25 meters, and a total area of 1041,56 square meters.

Note: The maps used for simulation purposes are not up to scale.

Assumption 6: It has been roughly calculated that a drone flying 15 meters above ground, at the speed of 3 meters per second, it will take, on average, approximately 126.39 seconds (2.11 minutes) to record footage of the entire area, given that the entire flight path has been calculated to be 379.16 meters long.

4. Test If Human Detection Python Script Runs:¶

In [5]:
# Run the Human_Detection_Script.py script using subprocess
result = subprocess.run(['python', 'Human_Detection_Script.py'], capture_output=True, text=True)

# Check if the script ran successfully by examining the return code
if result.returncode == 0:
    print("Test passed: Script ran successfully.")
else:
    print(f"Test failed: Script returned error code {result.returncode}.")
Test passed: Script ran successfully.

5. Display Script Outputs:¶

In [6]:
%run Human_Detection_Script.py
Loaded video file 'test video.mp4' with size: 26.77 MB successfully.
Video file 'test video.mp4' opened successfully.
Original Duration: 126.38 seconds
Desired Duration: 126.39 seconds
Original FPS: 24
New FPS: 24.00
Processed video saved as output_video.mp4 with new duration of 126.39 seconds
Block 1: 5.21 seconds
Block 2: 15.62 seconds
Block 3: 26.04 seconds
Block 4: 37.15 seconds
Block 5: 47.57 seconds
Block 6: 57.98 seconds
Block 7: 68.4 seconds
Block 8: 79.51 seconds
Block 9: 89.93 seconds
Block 10: 100.34 seconds
Block 11: 110.76 seconds
Block 12: 121.87 seconds
Function 'grab_frames' defined successfully!
Function 'crop_to_aspect_ratio' defined successfully.
Calculated Aspect Ratio: 1.07
Frames grabbed successfully!
Bounding boxes for Block 1: [(36, 362, 118, 418), (960, 172, 1058, 226)]
Bounding boxes for Block 2: [(146, 714, 220, 828), (708, 4, 747, 27)]
Bounding boxes for Block 3: [(482, 492, 576, 606), (456, 526, 478, 548)]
Bounding boxes for Block 4: [(850, 508, 984, 624)]
Bounding boxes for Block 5: [(880, 746, 992, 828), (200, 26, 380, 100)]
Bounding boxes for Block 6: [(996, 970, 1098, 1076), (410, 160, 456, 188)]
Bounding boxes for Block 7: [(684, 276, 788, 334)]
Bounding boxes for Block 8: [(26, 786, 88, 836), (800, 670, 836, 734), (316, 296, 384, 354)]
Bounding boxes for Block 9: [(712, 682, 834, 726)]
Bounding boxes for Block 10: [(306, 882, 342, 914), (876, 552, 914, 612), (208, 154, 322, 206)]
Bounding boxes for Block 11: [(172, 378, 228, 466)]
Bounding boxes for Block 12: [(278, 716, 352, 806), (772, 0, 870, 28)]
Total number of humans: 22
Map dimensions: Width = 3453 pixels, Height = 4320 pixels
Human 1 detected with global pixel coordinates: (36, 362).
Human 2 detected with global pixel coordinates: (960, 172).
Human 3 detected with global pixel coordinates: (146, 1794).
Human 4 detected with global pixel coordinates: (708, 1084).
Human 5 detected with global pixel coordinates: (482, 2652).
Human 6 detected with global pixel coordinates: (456, 2686).
Human 7 detected with global pixel coordinates: (850, 3748).
Human 8 detected with global pixel coordinates: (2031, 3986).
Human 9 detected with global pixel coordinates: (1351, 3266).
Human 10 detected with global pixel coordinates: (2147, 3130).
Human 11 detected with global pixel coordinates: (1561, 2320).
Human 12 detected with global pixel coordinates: (1835, 1356).
Human 13 detected with global pixel coordinates: (1177, 786).
Human 14 detected with global pixel coordinates: (1951, 670).
Human 15 detected with global pixel coordinates: (1467, 296).
Human 16 detected with global pixel coordinates: (3014, 682).
Human 17 detected with global pixel coordinates: (2608, 1962).
Human 18 detected with global pixel coordinates: (3178, 1632).
Human 19 detected with global pixel coordinates: (2510, 1234).
Human 20 detected with global pixel coordinates: (2474, 2538).
Human 21 detected with global pixel coordinates: (2580, 3956).
Human 22 detected with global pixel coordinates: (3074, 3240).

6. Test Video File Path Handling and Video File Size Calculations:¶

In [7]:
def test_video_upload():
    # Test case 1: File exists (Using mock):
    with patch('os.path.exists', return_value=True), \
         patch('os.path.getsize', return_value=5000000), \
         patch('sys.argv', ['script_name', 'existing_file.mp4']):
        
        video_file_path = None
        display_output = True
        file_size = None

        if len(sys.argv) > 1:
            video_file_path = sys.argv[1]
        else:
            video_file_path = "test video.mp4"

        if os.path.exists(video_file_path):
            file_size = os.path.getsize(video_file_path) / (1024 * 1024)
            if display_output: 
                print(f"Loaded video file '{video_file_path}' with size: {file_size:.2f} MB successfully.")

        assert video_file_path == 'existing_file.mp4'
        assert file_size == expected_file_size  # This would be your expected file size

    # Test case 2: File does exist (Using real file):
    with patch('sys.argv', ['script_name']):
        
        video_file_path = None
        display_output = True
        file_size = None

        if len(sys.argv) > 1:
            video_file_path = sys.argv[1]
        else:
            video_file_path = "test video.mp4"

        if os.path.exists(video_file_path):
            file_size = os.path.getsize(video_file_path) / (1024 * 1024)
            if display_output: 
                print(f"Loaded video file '{video_file_path}' with size: {file_size:.2f} MB successfully.")

        assert video_file_path == 'test video.mp4'
        assert file_size is not None  # Because the file actually exists

# Defining the expected_file_size value for the mock:
expected_file_size = 5000000 / (1024 * 1024)

# Run the test
test_video_upload()
Loaded video file 'existing_file.mp4' with size: 4.77 MB successfully.
Loaded video file 'test video.mp4' with size: 26.77 MB successfully.

7. Test Video Length Corrrection:¶

In [8]:
# Define tolerance for duration check (in seconds)
tolerance = 0.1

# Open the output video to check its properties
check_cap = cv2.VideoCapture(output_video_path)

if not check_cap.isOpened():
    print(f"Test Failed: Couldn't open the processed video file '{output_video_path}'.")
else:
    print(f"Test: Processed video file '{output_video_path}' opened successfully.")

    # Get properties of the output video
    check_video_length = int(check_cap.get(cv2.CAP_PROP_FRAME_COUNT))
    check_fps = int(check_cap.get(cv2.CAP_PROP_FPS))
    check_duration = check_video_length / check_fps

    # Check if the duration is close to the desired duration
    if abs(check_duration - desired_duration) <= tolerance:
        print(f"Test Passed: Desired duration {desired_duration:.2f} is close to actual duration {check_duration:.2f}.")
    else:
        print(f"Test Failed: Desired duration {desired_duration:.2f} is not close to actual duration {check_duration:.2f}.")

    check_cap.release()
Test: Processed video file 'output_video.mp4' opened successfully.
Test Failed: Desired duration 126.39 is not close to actual duration 131.87.

8. Test Time Stamps Calculations Function:¶

In [9]:
def test_calculate_timestamps():
    total_time = 126.39  # in seconds
    speed = 3.0  # in meters per second
    vertical_length = 31.25  # in meters
    horizontal_length = 33.33  # in meters
    path = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]  # The order of blocks

    expected_output = [
        ("Block 1", 5.21),
        ("Block 2", 15.62),
        ("Block 3", 26.04),
        ("Block 4", 37.15),
        ("Block 5", 47.57),
        ("Block 6", 57.98),
        ("Block 7", 68.4),
        ("Block 8", 79.51),
        ("Block 9", 89.93),
        ("Block 10", 100.34),
        ("Block 11", 110.76),
        ("Block 12", 121.87)
    ]

    calculated_output = calculate_timestamps(total_time, speed, vertical_length, horizontal_length, path)
    
    if calculated_output == expected_output:
        print("Test passed: The function produced the expected output.")
    else:
        print("Test failed: The function did not produce the expected output.")
        print("Expected output:", expected_output)
        print("Calculated output:", calculated_output)

# Run the test
test_calculate_timestamps()
Test passed: The function produced the expected output.

9. Test Human Detection Function:¶

In [10]:
# Load the JSON data from the file
with open("location test.json", "r") as file:
    json_data = json.load(file)

# Extract the coordinates
json_coordinates = [(region['shape_attributes']['x'], region['shape_attributes']['y']) for region in json_data["Coordinates Testing Map.jpg14727562"]["regions"]]

THRESHOLD = 50 

# Counting of predicted boxes that had a match in the JSON ground truth data:
matched_predictions = 0

# List to keep track of which boxes in JSON data had a match:
matched_boxes_json = [False] * len(json_coordinates)

detected_coords = []  # list to store (x, y) coordinates of detected humans
ground_truth_coords = []  # list to store (x, y) coordinates of matching ground truth

def find_closest_match(coordinates, json_coordinates):
    closest_idx = -1
    min_distance = float("inf")
    
    for idx, (json_x, json_y) in enumerate(json_coordinates):
        distance = np.sqrt((json_x - coordinates[0])**2 + (json_y - coordinates[1])**2)
        if distance < min_distance:
            closest_idx = idx
            min_distance = distance
            
    return closest_idx, min_distance

for x, y, _, _ in global_bboxes:
    idx, distance = find_closest_match((x, y), json_coordinates)
    match = json_coordinates[idx]
    
    if distance < THRESHOLD:
        matched_predictions += 1
        matched_boxes_json[idx] = True
        detected_coords.append((x, y))
        ground_truth_coords.append(match)
        print(f"The detected human at position ({x}, {y}) closely aligns with the ground truth at {match} with a distance of {distance:.2f} pixels.")
    else:
        print(f"The detected human at position ({x}, {y}) does not have a close alignment with any ground truth.")
The detected human at position (36, 362) closely aligns with the ground truth at (57, 373) with a distance of 23.71 pixels.
The detected human at position (960, 172) closely aligns with the ground truth at (958, 189) with a distance of 17.12 pixels.
The detected human at position (146, 1794) closely aligns with the ground truth at (162, 1793) with a distance of 16.03 pixels.
The detected human at position (708, 1084) closely aligns with the ground truth at (716, 1093) with a distance of 12.04 pixels.
The detected human at position (482, 2652) closely aligns with the ground truth at (466, 2644) with a distance of 17.89 pixels.
The detected human at position (456, 2686) closely aligns with the ground truth at (466, 2644) with a distance of 43.17 pixels.
The detected human at position (850, 3748) does not have a close alignment with any ground truth.
The detected human at position (2031, 3986) closely aligns with the ground truth at (2024, 3962) with a distance of 25.00 pixels.
The detected human at position (1351, 3266) closely aligns with the ground truth at (1352, 3254) with a distance of 12.04 pixels.
The detected human at position (2147, 3130) closely aligns with the ground truth at (2142, 3125) with a distance of 7.07 pixels.
The detected human at position (1561, 2320) closely aligns with the ground truth at (1561, 2326) with a distance of 6.00 pixels.
The detected human at position (1835, 1356) closely aligns with the ground truth at (1829, 1371) with a distance of 16.16 pixels.
The detected human at position (1177, 786) does not have a close alignment with any ground truth.
The detected human at position (1951, 670) does not have a close alignment with any ground truth.
The detected human at position (1467, 296) does not have a close alignment with any ground truth.
The detected human at position (3014, 682) closely aligns with the ground truth at (2997, 691) with a distance of 19.24 pixels.
The detected human at position (2608, 1962) closely aligns with the ground truth at (2601, 1953) with a distance of 11.40 pixels.
The detected human at position (3178, 1632) closely aligns with the ground truth at (3158, 1627) with a distance of 20.62 pixels.
The detected human at position (2510, 1234) closely aligns with the ground truth at (2502, 1236) with a distance of 8.25 pixels.
The detected human at position (2474, 2538) closely aligns with the ground truth at (2472, 2526) with a distance of 12.17 pixels.
The detected human at position (2580, 3956) closely aligns with the ground truth at (2572, 4005) with a distance of 49.65 pixels.
The detected human at position (3074, 3240) does not have a close alignment with any ground truth.

10. Evaluation Metrics: Precision, Recall, Location Accuracy, and Human Count:¶

In [11]:
# Calculating precision and recall:
precision = matched_predictions / len(global_bboxes) if len(global_bboxes) > 0 else 0
recall = sum(matched_boxes_json) / len(json_coordinates) if len(json_coordinates) > 0 else 0

# Calculating the average deviation:
deviations = []
for x, y, _, _ in global_bboxes:
    _, distance = find_closest_match((x, y), json_coordinates)
    deviations.append(distance)
average_deviation = sum(deviations) / len(deviations) if deviations else 0

# Calculating F1 Score:
if precision + recall != 0:
    f1_score = 2 * (precision * recall) / (precision + recall)
else:
    f1_score = 0

# Calculating Maximum Deviation:
max_deviation = max(deviations) if deviations else 0

# Counting total number of people in JSON data:
total_people_in_json = len(json_coordinates)

# Comparing the two totals and output results:
print(f"Total number of humans detected by the detection function: {original_detected_humans}")
print(f"Total number of humans according to the real ground truth data: {total_people_in_json}")
if original_detected_humans == total_people_in_json:
    print("Both counts match!")
else:
    difference = abs(original_detected_humans - total_people_in_json)
    print(f"There's a difference of {difference} between the detected count and the real ground truth count.")

print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"Average Location Deviation: {average_deviation:.2f} pixels")
print(f"F1 Score: {f1_score:.2f}")
print(f"Maximum Location Deviation: {max_deviation:.2f} pixels")
Total number of humans detected by the detection function: 22
Total number of humans according to the real ground truth data: 21
There's a difference of 1 between the detected count and the real ground truth count.
Precision: 0.77
Recall: 0.76
Average Location Deviation: 29.77 pixels
F1 Score: 0.77
Maximum Location Deviation: 76.66 pixels

11. Visualize the Difference Between Ground Truth (real) Coordinates and Detected Coordinates:¶

In [12]:
# Extracting x and y coordinates for plotting:
detected_x = [coord[0] for coord in detected_coords]
detected_y = [coord[1] for coord in detected_coords]
ground_truth_x = [coord[0] for coord in ground_truth_coords]
ground_truth_y = [coord[1] for coord in ground_truth_coords]

# Plotting:
plt.scatter(detected_x, detected_y, label='Detected Coordinates', marker='o', color='blue')
plt.scatter(ground_truth_x, ground_truth_y, label='Ground Truth Coordinates', marker='x', color='red')
plt.xlabel('X Coordinate Value')
plt.ylabel('Y Coordinate Value')
plt.title('Comparison between Ground Truth and Detected Coordinates')
plt.legend()
plt.grid(True)
plt.show()